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The statistical behavior of the size (or mass) of the largest cluster in subcritical percolation on 
a finite lattice of size N is investigated (below the upper critical dimension, presumably d c — 6). It 
is argued that as N — > oo the cumulative distribution function converges to the Fisher-Tippett (or 
Gumbel) distribution e~ e in a certain weak sense (when suitably normalized). The mean grows 
like sJlogiV, where st(p) is a "crossover size". The standard deviation is bounded near s^n/y/E 
with persistent fluctuations due to discreteness. These predictions are verified by Monte Carlo 
simulations on d = 2 square lattices of up to 30 million sites, which also reveal finite-size scaling. 
The results are explained in terms of a flow in the space of probability distributions as N — » oo. The 
subcritical segment of the physical manifold (0 < p < p c ) approaches a line of limit cycles where the 
flow is approximately described by a "renormalization group" from the classical theory of extreme 
order statistics. 
PACS: 64.60.Ak,02.50.-r 

I. INTRODUCTION 

In the latter half of this century, percolation has become the canonical model of quenched spatial disorder [Q. 
Among its many areas of application are polymer gelation, hopping conduction in semiconductors and flow in porous 
media ||. Percolation has also attracted the attention of mathematicians because it offers challenging problems in 
probability theory of relevance to statistical physics [HQ]. Since rigorous results are often not easily obtained, however, 
computer simulation has played a central role in the motivation and testing of new theoretical ideas ||. 

Most analytical and numerical studies have examined the critical point (p = p c ) where the correlation length £(p) 
diverges, but here we focus on subcritical percolation (p < p c ) characterized by £ < oo. In this case, it is known that 
the cluster-size distribution n s (p) , the number of clusters of size (or mass) s per site of an infinite hypercubic lattice 
of coordination z, decays exponentially for all p < l/(z — 1) < p c ||^| 

logn s x — s as s — > oo, (1) 

where a n x b n means "a n scales like b n " , or more precisely 

< liminf < limsup -p- < oo. (2) 

(The quantity P s = n s s, which is the probability that the origin is part of a cluster of size s, is also sometimes called 
the "cluster-size distribution" HQ.) The total number of finite clusters per lattice site at the critical point n c = 
n s(Pc) is known analytically for d — 2 bond percolation |^,|9| and numerically for site and bond percolation for 
various lattices in d = 2 and d — 3 |l^] . Universal finite-size corrections to n c have also been studied extensively [p~0| p~3|| • 
Beyond the rigorous result (0), it is believed that the cluster-size distribution decays exponentially for all p < p c 
with a characteristic size s £ (p) and a power-law prefactor 



s/s^ — > oo. (3) 



where the exponent 9 is supposed to be independent of p with 9 — 1 for d — 2 and 8 = 3/2 for d = 3, respectively [[ij. 
The quantity in (||) is called the "crossover size" since large clusters (s ^ 1) of size much smaller than behave 
"critically" , while much larger clusters behave "subcritically" , as explained below. Because large clusters are fractal 
objects, the crossover size and the correlation length are related by oc £ D , where D < d is the fractal dimension of 
the infinite cluster at p — p c . 

In contrast to the cluster-size distribution, relatively little is known about the size of the largest cluster S'(Ar) in a 
finite system of size N = L d for p < p ci with the notable exception of the recent work of Borgs et al. |l4|] . (Our notation 
for the random variable S'(Ar) is explained below.) It is widely believed that the mean largest-cluster size fiN = E[5(jv)] 
scales like fijy oc s^logA^ for p < p c . This follows from the heuristic argument Nn^ w 1, which supposes that the 
largest cluster can be placed independently at any site in the lattice [Q. (This useful idea is extended significantly in 
section |l| below.) Recently, from certain scaling axioms verified for d = 2 and believed to hold for d < d c — 6, Borgs 
et al. have proved the somewhat weaker statement /J.L d /C D x l°g(£/£') as L/£' —> oo, or equivalently 
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//jv/4 ~ log(iV/4) as 7V/s c -> oo (4) 

where is another correlation length defined in terms of "sponge-crossing probabilities" and = £' D is a corre- 
sponding crossover size [[l4|. (Note that d < d c is assumed throughout this paper.) 

In applications ST^n provides a measure of the maximum connectivity of a random medium, which is of fundamental 
interest in the subcritical regime. From a theoretical point of view, the "strength" (or concentration) of the largest 
cluster 5(jv) /N plays the role of an order parameter since its expected value in the "thermodynamic limit" 

Poofo) - lim fi N /N (5) 

N^oc 

has a discontinuous slope at p ~ p c with P oc (p < p c ) = and Poo(p > Pc) > 0. Beyond the limiting behavior of the 
mean /ijv, however, a much more complete understanding of the percolation transition is contained in the cumulative 
distribution function (c.d.f.) of the largest-cluster size 

F N (s) = Prob(S w < s) (6) 

which also describes all size-dependent fluctuations of the order parameter. In this sense, the behavior of F^(s) near 
the critical point fully describes the "birth of the infinite cluster" fl4| . Beginning with the same scaling axioms as in 
deriving (|4|), Borgs et al. have also proved that Fjv(s) varies significantly only on the scale of the mean for p < p c 

lim liminf [F N (e"^ log(iV/4)) - F N (e^ ]og(N/%))] = 1. (7) 

It is believed that (jij) and (Q) would also hold with the usual definition of £ as the decay length of the pair correlation 
function jL4j, so we expect £' oc £ and a s^. 

Although (|]) and (Q) provide important rigorous justification for the logarithmic scaling of the mean /i^r, the 
shape of the distribution F/v(s) and scaling of the variance u 2 N = Var[5(jv)] appear not to have been studied (either 
numerically or analytically) before this work. Moreover, no connections have yet been made between subcritical 
percolation and the classical limit theorems of probability theory. Such fruitful connections, which are known to 
explain Gaussian fluctuations away from the critical point in thermal phase transitions p5| , would presumably come 
from the statistical theory of extremes [p~6|— [2C|] - 

The article is organized as follows. First, in order to build the reader's physical intuition, simple approximations 
are made in section || to derive the asymptotic behavior of -Fjv(s) and propose finite-size scaling laws for /xjy and cr/v. 



In section [II, these predictions are verified for the d = 2 square lattice with computer simulations, which also provide 



empirical functional forms and numerical parameters for the scaling laws. Finally, in section IV the preceding results 
are explained in terms of a "subcritical renormalization group" . 



II. SIMPLE ARGUMENTS 



A. Connection with Extreme Order Statistics 



Consider site percolation on a periodic, hypercubic lattice of N — L sites. Since any cluster can be uniquely 
identified with the site nearest to its center of mass (of lowest index, if there is more than one such site) , we can define 
a set of N independent, identically distributed (i.i.d.) random variables (r.v.) {Si} such that Si = s if the largest 
cluster centered at site i has size s and Si = if no cluster is centered there. Clearly, the most probable value of Si 
is zero, since the number of clusters is always much less than the number of sites, and it is exceedingly rare to have 
more than one cluster centered at the same site, e.g. when one cluster encircles another. 

We seek the c.d.f. Fn(s) of the "extreme order statistic" fl9| , |2"o|| 

S(m\ = max Si (8) 

k ; l<i<N 

in N — > oo withp < p c fixed. Extreme order statistics have many classical applications, such as the fracture strength of 
solids, the occurrence of manufacturing defects and the frequency of extreme weather [[l9|. More recently in statistical 
physics, extreme order statistics have been applied to glassy relaxation on fractal structures |2l| , the dynamics of elastic 
manifolds in random media ]22]p3| ], the random energy model p^|j25[ |, decaying Burgers turbulence p4| , dispersive 
transport in amorphous materials |2l| and random sequential adsorption [ p7| . In such applications, extreme order 
statistics are used to describe the most important features of a random energy landscape, e.g. lowest activation energy 
barriers. 
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In this work, we show that the theory of extreme order statistics also has relevance for the geometric features of 
random systems. In one dimension, the largest cluster in percolation bears some resemblance to the longest increasing 
subsequence of a random permutation, which is known to exhibit similar limiting statistics (see Ref. Eq] for a recent 
review), although the former problem is much simpler |29| . Of course, the interesting cases of percolation, however, 
are in higher dimensions, which we address here. 



B. A First Approximation Based on Independence 

The main difficulty in the percolation problem for S^at), aside from the complexity of the parent distribution, is 
that the r.v. {Si} are correlated. Much is known about order statistics of i.i.d. r.v. |L9|], but dependent r.v. have been 
studied mostly in cases much simpler than percolation EQ). Nevertheless, considerable insight is gained by neglecting 



correlations in deriving an asymptotic form of F/v(s), which will be justified below in section IV. As one might expect, 
correlations in the subcritical regime are too weak to have an effect in the thermodynamic limit. 

Whenever N ^> s^, which holds in the limit p — > for fixed N 3> 1, cluster sizes comparable to the system size 
are exponentially rare according to (jl]). Since correlations between the r.v. {Si} arise due to excluded volume effects 
(see below), Cov[Si, Sj] is exponentially small for most pairs of sites (i, j) in this limit. Therefore, as a natural first 
approximation we assume N independent selections from a continuous parent distribution with exponential decay 

Prob(S' i < s) ~ 1 - e- s/s i as s -> oo (9) 

where s|(p) is an effective crossover size (see below). Note that the asymptotic distribution of the maximum of i.i.d. 
r.v. is entirely determined by the tail of the parent distribution [ |l7|]Tgfl , so the complicated behavior of Si for small 
sizes is irrelevant. From the method of Cramer jl9| applied to (|J), we quickly find 



/ \ N 

N I p -(s- s f log w )/s s 



Fj 



N 



which implies 



where 



M~(l-e-/'?) =[l- e - (10) 



lim G N (z) = e~ e (11) 



G N {z) = F N ( S p + si log AT) = Prob (5 (JV) /«| < z + log N) (12) 



is a normalized c.d.f. Therefore, in this simple approximation the largest-cluster size is sampled from the Fisher- 
Tippett distribution |Q with c.d.f. e~ e ~, mean 7 = 0.5772 . . . (Euler's constant) and variance 7r 2 /6 |l7| ; the mean 
largest-cluster size grows logarithmically /itv/sJ ~ log N + 7, while the standard deviation converges to a certain 
constant ctat/sJ — > Comparing with (^), we can view the leading-order asymptotic behavior of the mean 

^jv ~ logiV as N — » 00 (13) 

as defining the effective crossover size s| (should it exist), which is presumably proportional to the others introduced 

S| OC sjr OC S£. 



C. Corrections Due to Discreteness 



There appears to be a problem with ( pr| ) for percolation on a lattice: A discrete c.d.f. (which is a piecewise constant 
function) cannot converge to a continuous function when scaled by a bounded standard deviation. In fact, since s in 
(p|)-([ro|) is restricted to integer values, the limit in (|ll]) docs not exist. Instead, if we replace s by [s] (the nearest 
integer to s) in (|9[), then the normalized c.d.f. Gn{z) defined by (|J) approaches a quasi-periodic sequence of piecewise 
constant functions with period roughly 1/s^ in log A, 



N 



-e 



z+5 N {z)/s\ \ - z +s N (*y, 



G N {z) =11 — - e~ e as N -> 00 (14) 
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where 



S N (z) = s*{z + log AO _ [ s *( z + log N)] . (15) 

(The limiting sequence is strictly periodic only when e 1 ^? is an integer.) The piecewise constant functions in ( p^ ) 
converge weakly in the sense that as N — > oo the "step edges" periodically trace out two continuous functions 

G(z) = limsupGW(z) = e" e " (16a) 

G{z) = liminf Gat(z) = e - e " i+1/<2S?> (16b) 

which define a stationary "envelope" of width 1/st about the Fisher-Tippett distribution. If we let a be the lattice 
spacing (which we take to be unity), then the envelope width would be a d /s^, showing that the lack of convergence 
is controlled by the relative importance of discreteness on the scale of the crossover size. Note that the continuous 
distribution (O) is recovered in the limit p — > p c (taken after the limit N — > oo) 

lim G{z) = lim G{z) = e~ e ~\ (17) 

P^Pc P^Pa 

as the crossover size diverges and hence the envelope width vanishes. 

For s| < oo (p < p c ), the continuum result for the scaling of the mean ( |l3| ) still holds, but the standard deviation 
has persistent fluctuations due to discreteness 

fjv/s| ~ ir/VE + 6n as N — * oo (18) 

where ejv is periodic in logA^ with period l/s|. Because the limiting sequence ( |T^ ) fluctuates periodically about a 
certain fixed distribution, it can be viewed as a "limit cycle" in some appropriate Banach space (see below). Intuitively, 
the distribution conforms asymptotically to the Fisher-Tippett distribution as closely as possible within the constraints 
imposed by discreteness. 



D. Corrections due to Correlations 

The simple derivation of ([l4]) should be valid whenever Sf«l (or s'^ -C 1 or s| <C 1) because then even a single 
site qualifies as a large cluster. If « 1, however, non-negligible correlations among the r.v. {Si} arise because a 
cluster of size excludes on the order of nearby sites from being part of any other cluster. If 3> 1, on the 

order of £ rf oc s^ D ^> sites are excluded by such a cluster since it engulfs many smaller, exterior regions due to 
its fractal geometry (D < d). Therefore, correlations can be included heuristically by replacing N with N/s^ in ( |l4| ) 
which simply shifts the mean by a constant A/i/s^ = — alogS£ without affecting the leading-order scaling behavior 
(|l3|), where a = if S£ -C 1, a = 1 if S£ « 1 and a = d/ D if s^^> 1. Note that the effect of correlations is negligible 
for N ^> s^. Correlations do, however, control the finite-size scaling at smaller values of N. 



E. Finite-Size Scaling 

There are only two relevant length scales in percolation, the correlation length £ and the lattice spacing a (normalized 
to unity), or equivalently two mass scales, the crossover size sj (or or sp and the volume of a lattice cell a d (also 
normalized to unity). If 3> a, then discrete lattice effects on "large" clusters with sizes on the order of or larger 
become negligible, and the system has only one relevant mass scale sj. As a consequence of the single scale in the 
limit p->p c , any function of A~ and is expected to collapse into a self-similar form interpolating between a critical 
power-law in A" valid for 1 -C A" -C and a subcritical function of M/s^ (for some constant a) valid for 1 -C -C N. 
For example, because hn{p) and <7n{p) have the dimensions of sj, we have 

l i N /s i = $(JV/«£) (19a) 
a N /s 6 = ^(N/af) (19b) 
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for some universal functions <J>(x) and fy(x) which do not depend on p. In the critical regime N <C s%, it is expected 
that /Xjv = N D / d and that both /x^r and er/v are asymptotically independent of sg, which implies a = d/D and 

$(x) oc oc s D / d as s — > 0. From (Q) and (|l4|), we also expect x logx and ^(x) x 1 as i -> oo. 

The classical idea behind the finite-size scaling ansatz (19) can be understood as follows. A large subcritical cluster 
(on an infinite lattice) intersected with a finite box of side L exhibits a crossover from "critical scaling" at small scales 
a -c L -c £ (where a portion of it typically spans the box) to "subcritical scaling" at large scales L £ (where it is 
entirely contained within the box). Note that the lattice spacing a is irrelevant as long as £ 3> a; all systems with the 
same ratio L/t; should have equivalent statistics, up to small corrections of order a/£ due to discreteness. Of course, 
as p — > the finite-size scaling ansatz breaks down, and discrete effects eventually dominate over correlation effects, 
as explained above. 

III. NUMERICAL RESULTS 
A. Methods 

In order to test the predictions of the previous section, numerical simulations are performed for site percolation 
on periodic d = 2 square lattices of sizes N = 5 2 , 13 2 , 31 2 , 74 2 , 129 2 , 175 2 , 415 2 , 982 2 , 2324 2 and 5500 2 with 
p = 0.05,0.1,0.15, . . .0.5 |||. Note that the value p c = 0.592 7460 ± 0.000 0005 has been determined numerically in 
this case (32j. For each (N,p), between M — 2 x 10 5 and M = 10 s samples are generated, and clusters are identified 
by a recursive "burning" algorithm |qj33||. With these methods, trillions of clusters are counted in several months of 
CPU time on Silicon Graphics R-10,000 processors. 

In performing such large-scale simulations, special attention must be paid to the choice of (pseudo) random- number 
generator |32|,|34| . With the standard 32-bit generator rand ( ) , the largest observed cluster sizes tend to come in 
multiples of integers > 2 (after accumulating data from a very large number of "random" realizations), which indicates 
that the periodicity of the generator is having an artificial effect. In all the simulations reported here, however, the 48- 
bit generator drand48() is used, and the numerical cluster-size distributions n s {p) appear to be free of any systematic 
errors. 



B. Largest-Cluster Distributions 

The measured largest-cluster distributions are in very close agreement with the predictions of (Q)-(16) for all p <p c , 
as shown in Fig. [l] for the case p — 0.15. In order to check the shape of the c.d.f. against e _e , the distributions 
are normalized to have mean 7 and variance 7r 2 /6, which differs somewhat from the normalization given above in 
(|l2|). As predicted by (16) the discrete c.d.f.s in Fig. |l](a) lie almost perfectly within a continuous envelope between 
two Fisher- Tippett distributions. Likewise, the discrete probability density functions (p.d.f.) shown in Fig. 0(b) for 
p = 0.15, which are simply the step heights in Fig. |l|(a), exhibit the expected small fluctuations about the Fisher- 
Tippett p.d.f. e~ z ~ e ~ due to discreteness. Using the value s|(0.15) = 1.313 (determined independently below), the 
width of the envelope is seen to be very close to l/s|. Note that the c.d.f.s in Fig. 0(a) are shifted slightly outside 
the envelope by eNy/ft/fas^) because sizes have been scaled by on\[%I , k rather than by s|. Overall, the agreement 
between (|l4])-(16) and the simulation results is excellent for all the values of p considered here, thus lending some 
credence to the approximations of the previous section. 

C. Cluster-Size Distributions 

In order to test the finite-size scaling laws (19), numerical values of the crossover size sp (p) are obtained by fitting the 
cluster-size distributions n s (p) to (^). When compiling these distributions, unwanted finite-size effects are minimized 
by requiring that N 1 ^ exceed the largest observed cluster size (for a given value of p). With this restriction, a 
single cluster cannot directly see the periodic boundary conditions. Motivated by (||), the tails of the cluster-size 
distributions are fit to 

logn s = C — 6>logs — s/s^ for s > s ta jp (20) 

Fitting to such an asymptotic form requires some care: The starting point of the fit s ta ^ must be large enough that 
the asymptotic behavior is dominant but also small enough that the fit is not degraded by statistical fluctuations. 
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In this work s ta jj is systematically chosen where \ds^/ds^\\, \d6/ds^\\ and x 2 are minimal (\ 2 ~ 1)- The fit is 
deemed reliable when the value thus obtained at s ta y is contained within all the other confidence intervals for fits 
with slgfi > Stail- Because the raw distributions have bin counts ranging from over 10 11 at size 1 down to and 1 in 
the large-size tails, the fitting cannot be done by the usual least-squares method, which assumes normally distributed 
errors. Instead, the parameters (C,9,s^) are fit to the n s (p) data by Poisson regression, which properly handles the 
discrete, rare events in the tail (using the package X-Lisp-Stat [p5|| ). 

The fitting results are given in Table Q. Fitting errors grow as p — > because less data is available to accurately 
resolve the tail of n s {p) and also as p — > p c due to critical slowing down. Although the results for should be reliable, 
the results for 9 (not needed in this work) could change somewhat if different corrections to scaling were considered [|J . 
Therefore, the observed small deviation of 9 from its conjectured value of 1 (for all < p < p c ) may only be an 
artifact of the fit. 



D. Scaling of the Mean and Variance 



As shown in Figs. ||-||, the collapse of the mean and standard deviation of the largest-cluster size plotted as /Ltjv/s£ 

and on /s£ versus N/s^ D (using D = 91/48 0) is nearly perfect for p > 0.30. As discrete-lattice effects become 
important at smaller values of p, however, the data drifts off the universal curves, and the tiny oscillations predicted 
by begin to be visible in the standard deviation. These effects are most pronounced when < 1 (p < 0.10) since 

then the interpretation of s d J D as an excluded volume is meaningless and the second length scale a = 1 cannot be 



ignored. Indeed, when /u/s^ and <x/s£ are plotted versus AT/max{l, s^ D }, as shown in Fig. |4], the data for < 1 lies 



much closer to the universal curves, consistent with the heuristic arguments given above in section II D| . 

From the simulation results with 3> 1, the universal scaling functions in (19) for the d = 2 square lattice can be 
determined numerically. For p > 0.30, the scaling function $(x) for the mean is fit to the empirical form: 



$(z) 



«2 



«3 



(a 4 + x) a ^ 



log 



D/d' 



(21) 



where the best parameter values (in the least squares sense) are a\ — 8.1 ± 0.5, a 2 = 0.954 ± 0.005, 0,3 = 3.3 ± 0.2, 
a 4 = 1.0±0.3 and 05 = 0.61±0.2. The collapsed data in Fig. ^| shows a smooth crossover between the expected critical 
and subcritical scaling laws, ~ 30. 3x D/d as x — ► and ${x) ~ a 2 log(l + {x/ai) D/d ) ~ (a 2 D/d) log x = 0.90 log a; 
as x — > 00, respectively. The simulation result pLjq ~ 0.90s^log N justifies our definition of the effective crossover size 
s c m (|f) and for the case of the square lattice relates it to the crossover size in (tfl) via = 0.90s^. 

Although the standard deviation appears to be bounded from the data shown in Fig. ^, we can only safely conclude 
ctjv = o(loglogiV) because the subcritical portion of the data only spans five decades in N/s d ^ D (due to memory 
restrictions). Following the derivation in the next section, however, it can be proved |l3| that — 0(1) follows from 
very reasonable assumptions related to (ji|) and (Q). Therefore, for p > 0.30 the scaling function ^{x) for the standard 
deviation is fit to the empirical form: 



l + Mog(l + (z/&i) 13/d ) 



(22) 



where b\ — 8.4 ± 0.8, 62 = 1.23 ± 0.01 and 63 = 1.5 ± 0.1. Once again, as shown in Fig. ||, the collapsed data for 
1 fits closely the expected scaling laws ^(x) ~ 0.25x D ^ d as x — > and ^(x) ~ 62 = 1-23 as x — > 00. Note that 
a/s* - 1.23/0.90 = 1.36 for s e > 1, which differs from n/VQ = 1-2825 ... by only 6.5%. 



IV. SUBCRITICAL RENORMALIZATION 



A. Flow in the Space of Distributions 



There is a profound connection between renormalization-group (RG) concepts from the theory of critical phenom- 
ena 13^ , 13^1 and the limit theorems of probability theory through what one might call "renormalization of the order 
parameter" (as opposed to "renormalization of the coupling constant" fl38|]). For many second-order phase transitions, 
the appropriate order parameter is a sum or average of identical, correlated r.v. indexed by the sites of a lattice, 
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e.g. the total magnetization in the Ising model. In such cases the central limit theorem for i.i.d. r.v. describes the 
behavior of the order parameter away from the critical point, where correlations are unimportant, and the mathe- 
matical concept of a "stable distribution" {mJ[|2 41 1 amounts to a fixed point of an RG in the space of probability 



distributions of the order parameter |15|j4l|| 

In the case of percolation, although the appropriate order parameter is not the sum but rather the maximum of 
certain r.v., RG concepts can still be applied. Consider the c.d.f. of the largest-cluster size Fn(s), which we normalize 
(or rather, successively "renormalize" ) as follows 



where 



G N (z) = F N (a N z + fx N ) = Prob (Z N < z) , (23) 

Zn ^ S (N)-^N 
CTjv 

is a r.v. with zero mean and unit variance. Note that since SVjv) assumes only integer values, Gn(z) is a piecewise 
constant function of z G 5ft with discontinuities at a countable set of points {(s — [in)/o'n\s € H} with equal spacing 
l/a N . 

The discrete mapping Gn(z) can be viewed as a flow with increasing N (in some appropriate Banach space, e.g. 
LP) which advects distributions towards various possible limiting behaviors. The subcritical portion of the flow is 
depicted in Fig. |5|. For each N G H, the set of normalized distributions {G^} parameterized by < p < 1 forms a 
one-dimensional manifold, which we call the "physical manifold" . The ends of the physical manifold corresponding 
to p — and p = 1 are pinned at trivial fixed points, which are unit step functions centered at x = and x = N, 
respectively (before normalization). Although these fixed points affect the nearby flow, every trajectory with < p < 1 
eventually escapes toward one of three possible limiting behaviors for sufficiently large N: subcritical (0 < p < p c ), 
critical (p = p c ) or supercritical (p c < p < 1). The latter two cases will be considered elsewhere; here we focus on 
subcritical behavior. 

According to the heuristic arguments in section [n] and the simulation results in section III , the subcritical segment 



of the physical manifold is advected into a line of limit cycles (|12|)-([18|) around the Fisher-Tippett distribution once 

the system size exceeds the crossover size N ^> s^, or more precisely, N 3> s^ D (L » £). The envelope manifolds 

G and G for < p < p c defined in (16) enclose the limit cycles. As sketched in Fig. M, the "radius" of each limit 
cycle grows as p — > like l/s^(p), which reflects the influence of the p = fixed point representing discreteness. 

In the opposite limit p — > p c (in the subcritical regime s^ D = o(N)), the envelope manifolds meet at a fixed point 
corresponding to the continuous Fisher-Tippett distribution. 

The approach to a fixed point is generally characterized by self-similarity, which holds "universally" for all tra- 
jectories leading to it. In the present case of a lattice-based system, this asymptotic self-similarity can described by 
a real-space RG which relates the c.d.f. Gn(z) for a system of size N = mn to the c.d.f. for each of n identical, 
contiguous cells (or blocks) of size m 

G m n = RnG m (25) 

in the limit m — » oo with n fixed, as shown in Fig. ^. As usual, the renormalization operators form an Abelian 
semigroup under composition R mn — R m o R n = R n o R m . These kinds of arguments are typically applied to a 
coupling constant in the vicinity of a critical fixed point, where they capture the effect of long-range correlations p7[ . 
They apply equally well, however, to the order-parameter distribution at a subcritical fixed point, where correlations 
disappear. 

In a system exhibiting a phase transition, there is a different RG of the form ( p5| ) valid near each of the various 
fixed points. As shown in Figure |^, subcritical trajectories with <C 1 pass by the p = fixed point and quickly 
become ensnared in the subcritical limit cycles, which are described by a RG given below. Such trajectories never 
feel much influence from the critical fixed point because correlation effects are dominated by discrete-lattice effects, 
due to proximity of the p — fixed point. For larger values of p < p c such that 1 -C < oo, however, subcritical 

trajectories at first approach the critical fixed point (1 -C N <C s d ^ D ) before crossing over to the subcritical limit 

cycles (N > sf ). This crossover behavior was demonstrated for the mean and variance above in Figs. but it 
also holds for the shape of the distribution. 

In the vicinity of the critical fixed point (1 <C N sf ), trajectories obey a different RG reflecting the dominance 
of long-range correlations. The critical fixed point is unstable in the sense that subcritical and supercritical trajectories 
eventually crossover to a different limiting behavior along the direction of an unstable manifold. One such "crossover 
manifold" shown in Fig. ||, which connects the critical and subcritical fixed points, corresponds to the limits N — > oo 
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and p — > p c with N / s^ D — > c for some constant c > 0. Likewise, the stable manifold converging to the critical fixed 
point corresponds to the limit N — > oo with p = p c - 

B. The Subcritical Renormalization Group 

More than seventy years ago, Frechet |l6| and Fisher and Tippett deduced the possible limiting distributions 
for extremes for i.i.d. r.v. with the following ingenious argument: If one partitions N = mn i.i.d. r.v. into n disjoint 
subsets containing to r.v. each, then the largest of the mn outcomes is equal to the largest of the n largest outcomes 
in each subset of size m 

S(mn) = max S\ m) (26) 

l<i<n v ' 

where is the largest outcome in the ith subset. Since the are themselves i.i.d. r.v., the c.d.f. Fjv(s) of S(jv) 
obeys the exact recursion 

F mn {s) = F m (s) n (27) 



for all to and n (to 1 ^, n 1 ^ G H). In terms of the normalized distribution (23), the recursion takes the form 



G mn {z) = G m r mnZ + ^ mn Mro ) , (28) 

which is essentially the subcritical RG for the normalized largest-cluster size distribution in percolation, but we must 
also address correlations and discreteness. In going from ( |27| ) to ( p8| ) we have defined a "renormalized" order-parameter 
distribution for percolation valid near the subcritical fixed point, in much the same way that the Kadanoff- Wilson 
block-spin construction defines a renormalized coupling constant for the Ising model valid near the critical fixed 
point P7| . 

The power of the cell-renormalization approach is that it provides a natural way to bound correlations and show 
that the subcritical limit cycles in percolation are described by the same RG as in the case of independent random 
variables (except for the subtle, persistent fluctuations due to discreteness described earlier). This is demonstrated 
rigorously in Ref. but here we simply explain the basic ideas of these authors. The strategy of the proof (inspired 
by Fisher and Tippett) is to fix the number of cells n > 1 and let the size of each cell m diverge. Since correlations 
decay exponentially with distance in the subcritical regime, it seems plausible that the "renormalized" cell random 
variables S^ m ^ would become uncorrelated (as the surface-to- volume ratio vanishes) in the limit m — > oo, at least if 
the dimension were not too high (d < d c ). 

Precise bounds on the intercell correlations can be obtained as follows [Q . If the cells were independent (with free 
boundary conditions) we would have F mn (s) = F m (s) n as above, but due to correlations we have instead the upper 
bound 

Frnn(s) < F rn (s) n (29) 

because joining the n cells together (and thus allowing clusters to connect and grow) can only increase the size of 
the largest cluster (and thus decrease the probability that the largest cluster has size < s). A lower bound can be 
obtained by considering a set of "supercells" (again with free boundary conditions) formed by appending a "skin" of 
linear width s/2 to each of the original cells, as shown in Fig. [|. If the mass of the largest cluster intersected with 
each of these overlapping supercells were independently < s, then the largest cluster overall would also have mass < s 
(because even a linear chain of length s would necessarily be completely contained in one supercell) , which yields p5|] 

F {ml/d+s)d (s) n < F mn (s) < F m (s) n . (30) 

These inequalities, which are valid for any dimension d, are the analogs of the Frechet-Fisher-Tippett "RG" ( p7| ) 
for subcritical percolation, and from them the Fisher-Tippett behavior of the subcritical limit cycles can be estab- 
lished |Q . Heuristically, it is quite plausible that if the "typical" largest cluster size, say within z standard deviations 
of the mean 

does not grow too fast, i.e. s mn (z) -C m}l d ) as to —> oo with n and z fixed, then ( |30| ) should reduce asymptotically to 
(p7|). Given the results of Borgs et al. (|))-(0), we actually expect the much stronger bound s mn (z) = O(logm). As 
explained below, this logarithmic scaling selects the Fisher-Tippett distribution e~ e from among the possible fixed 
points of (p7j). 
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C. The Subcritical Fixed Point 



The subcritical fixed point is described by the classical theory for extremes of i.i.d. r.v. ]i^ , pO| |. Following Fisher and 
Tippett |l7|] , let us assume for now that a continuous fixed point of ( p8|) exists pointwise for all z, i.e. Gn{z) — > G(z) 
as N ~ * oo and p — > p c such that = o(N). In this case, there must exist finite constants a n > and 6„ defined by 

lim — = a n (32a) 

m — >oo Ora 

lim /im " ~ Mm = bn (32b) 

such that the limiting distribution G(z) obeys the equation fl7fi 

G{z) = G{a n z + b n ) n (33) 

which was first discovered by Fisher and Tippett |44| . This functional equation has exactly three solutions, given 
in Table ||, up to trivial translations and rescalings of z by constants. In the case of i.i.d. r.v. the basins of 
attraction of these three fixed points, which depend only on the tail of the parent distribution, were first characterized 
by Gnedenko pj| . In the case of percolation, we have argued above that the appropriate parent distribution has 
an exponential tail, which suggests that the Fisher-Tippett distribution is indeed the subcritical fixed point (again, 
ignoring discreteness). 

Still assuming that a continuous limiting distribution G(z) exists, let us make the following additional assumptions 

fj, N ~ s|logiV (34a) 

(TN-trN-i = o(l/N) (34b) 

which are clearly supported by our numerical simulations and are consistent with the rigorous results (0) and (|7]). 
These scaling axioms are expected to hold for all d < d c . Note that Eq. (34b) implies ctjv = O(logiV) — Q (pn)', with 
the fact that ct/v must be an increasing sequence, it also implies a n = 1 for all n G H (see [Q). From ( |34a| ) and (32b), 
we have 

Mmn-Mm S* log 71 

~ — ► b n . (35) 

There are two possibilities: a m — > oo and cr m — > a for some constant a > 0. In the former case, we have b n = and 
hence a n ^ 1 (see Table ||), which is a contradiction. In the latter case, b n = (s^/a) logn. Without loss of generality 
we can set a = (since this simply amounts to rescaling z) and obtain the equation 

G(z) = G(z + \ogn) n (36) 

whose only nontrivial solution is e~ e . This also implies that the standard deviation converges to a constant 
proportional to the crossover size, er/v - * s^\/tt/6- 



D. The Subcritical Limit Cycles 

Of course, the assumption of pointwise convergence to a continuous limiting distribution is wrong (e.g. see Fig. [l]). 
Nevertheless, the conclusions of our simple derivation are not very different from those of a rigorous analysis including 
correlations and discreteness |Q. Note that although a limiting distribution G(z) — limGjv(z) does not exist, the 
envelope functions G(z) = liminf Gat(z) and G(z) = limsupGAr(z) do exist. Assuming that G(z) and G(z) are 
continuous (although Gn(z) is not), it can be shown from (34) that the envelope functions have the Fisher-Tippett 
form 

G(z-zi) = G(z-z 2 ) = e~ e ~* (37) 
for some constants — oo < z\ < z-i < oo and that the variance is bounded on the scale of the crossover size a m /s| 



O(l). The latter result supports our assumption above in fitting the simulation data to (22). The reader is referred 
to Ref. jl3| for a detailed proof of ([37]), which follows the RG strategy outlined here. The heuristic arguments and 



simulation results in sections [n| and [II also lead us to conjecture that the "envelope width" z 2 — %\ is simply set by 
the "strength" of the discreteness, i.e. the ratio of the lattice cell volume (a d = 1) to the crossover size 

z 2 -zx = — , (38) 

which vanishes in the limit p — > p c . 
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V. CONCLUSION 



In this article, a heuristic theory of the finite-size scaling of the largest-cluster size in subcritical percolation is 
presented and supported by numerical simulations. As expected away from a critical point, correlations are weak 
enough that a classical limiting distribution from the theory of extremes of independent random variables is recovered 
once the system size greatly exceeds the correlation length. This behavior can be easily understood via a cell- 
renormalization scheme, which also provides a suitable framework for rigorous analysis. Work is underway to extend 
this work to the supercritical case, where another classical limiting distribution arises, and the critical case, which 
involves a new universality class. 
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p 


s e 9 


0.05 


0.603 ±0.005 


1.0 ±0.1 


0.10 


0.976 ±0.001 


0.97 ±0.05 


0.15 


1.459 ±0.001 


0.99 ± 0.02 


0.20 


2.156 ±0.001 


1.03 ±0.01 


0.25 


3.226 ± 0.001 


1.075 ± 0.005 


0.30 


4.987 ±0.002 


1.109 ±0.004 


0.35 


8.156 ±0.005 


1.129 ±0.005 


0.40 


14.63 ±0.03 


1.13 ±0.03 


0.45 


31.4 ±0.1 


1.20 ±0.03 


0.50 


91.5 ±0.2 


1.20 ±0.03 



TABLE I. The measured correlation size s^(p) and exponent 6{p) for site percolation on the 
d — 2 square lattice. 



Name 


G(z) 


Range 


a n 


bn 


Basin of Attraction 


Frechet 


e~ z ~ a 


(0,co) 


> 1 





power-law tails 


Weibull 


e -(-*)° 


(-oo,0) 


< 1 





finite tails 


Fisher-Tippett 




(— co, oo) 


1 


log n 


exponential tails 



TABLE II. Summary of solutions to the Fisher-Tippett equation. In the last column, parent 
probability distributions pi(x) in the basins of attraction of each fixed point are (roughly) described 
by their decay at large x. (See Refs. [^8|-po| for more details.) 
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FIG. 1. The discrete c.d.f. in (a) and p.d.f. in (b) of the largest-cluster size for p = 0.15 and 
N = 982 2 , 2434 2 , 5500 2 , normalized to have mean 7 and variance 7r 2 /6. The c.d.f.s in (a) are 
compared with (16), where = 0.90 • 55(0.15) = 1.313. 
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FIG. 2. 



The mean largest cluster size plotted as fx/s^ versus Nj ' s d J D on a log-linear plot in (a) 



and a log- log plot in (b). The solid line fits the p > 0.30 data to Eq. (^l|) with asymptotic forms 
given by the dotted lines. The raw data is in the inset of (a); the legend in (b) applies throughout 
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FIG. 3. The standard deviation of the largest cluster size plotted exactly as in Fig. |j| In this 
case, the p > 0.30 data is fit to Eq. (j22|). 
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FIG. 4. The mean largest-cluster size plotted as /i/sj versus 




FIG. 5. Sketch of the trajectories (dashed lines) of the normalized largest-cluster size c.d.f. Gjv(z) 
(in some appropriate function space) for p < p c . Arrows indicate directions of flow as N — > oo. 
The physical manifold is shown for N = 1 and three larger values of N (solid lines). Also shown 
are three fixed points (thick dots) , the subcritical envelope manifolds (short dotted lines) and the 
crossover manifold (long dotted line) . 
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FIG. 6. Sketch of n = 9 cells (solid lines) used for subcritical renormalization on a square lattice 
of size N — mn. Along with the nine partitioning cells, one enlarged "supercell" (dashed lines) 
used in Ref. ^3) to bound correlations for cluster sizes smaller than s (as described in the main 
text) is also shown. 
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